Proposal and evaluation of FASDIM, a Fast And Simple De-Identification Method for unstructured free-text clinical records
نویسندگان
چکیده
PURPOSE Medical free-text records enable to get rich information about the patients, but often need to be de-identified by removing the Protected Health Information (PHI), each time the identification of the patient is not mandatory. Pattern matching techniques require pre-defined dictionaries, and machine learning techniques require an extensive training set. Methods exist in French, but either bring weak results or are not freely available. The objective is to define and evaluate FASDIM, a Fast And Simple De-Identification Method for French medical free-text records. METHODS FASDIM consists in removing all the words that are not present in the authorized word list, and in removing all the numbers except those that match a list of protection patterns. The corresponding lists are incremented in the course of the iterations of the method. For the evaluation, the workload is estimated in the course of records de-identification. The efficiency of the de-identification is assessed by independent medical experts on 508 discharge letters that are randomly selected and de-identified by FASDIM. Finally, the letters are encoded after and before de-identification according to 3 terminologies (ATC, ICD10, CCAM) and the codes are compared. RESULTS The construction of the list of authorized words is progressive: 12h for the first 7000 letters, 16 additional hours for 20,000 additional letters. The Recall (proportion of removed Protected Health Information, PHI) is 98.1%, the Precision (proportion of PHI within the removed token) is 79.6% and the F-measure (harmonic mean) is 87.9%. In average 30.6 terminology codes are encoded per letter, and 99.02% of those codes are preserved despite the de-identification. CONCLUSION FASDIM gets good results in French and is freely available. It is easy to implement and does not require any predefined dictionary.
منابع مشابه
Combining knowledge- and data-driven methods for de-identification of clinical narratives
A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity type...
متن کاملIs De-identification of Electronic Health Records Possible? OR Can We Use Health Record Corpora for Research?
Today an immense volume of electronic health records (EHRs) is being produced. These health records contain abundant information, in the form of both structured and unstructured data. It is estimated that EHRs contain on average around 60 percent structured information, and 40 percent unstructured information that is mostly free text (Dalianis et al., 2009). A modern health record is very compl...
متن کاملAnonimytext: Anonimization of Unstructured Documents
The anonymization of unstructured texts is nowadays a task of great importance in several text mining applications. Medical records anonymization is needed both to preserve personal health information privacy and enable further data mining efforts. The described ANONYMITEXT system is designed to de identify sensible data from unstructured documents. It has been applied to Spanish clinical notes...
متن کاملRobust Fractional Order Control of Under-actuated Electromechanical System
This paper presents a robust fractional order controller for flexible-joint electrically driven robots under imperfect transformation of control space. The proposed approach is free from manipulator dynamics, thus free from problems associated with torque control strategy in the design and implementation. As a result, the proposed controller is simple, fast response and superior to the torque c...
متن کاملRobust Fractional Order Control of Under-actuated Electromechanical System
This paper presents a robust fractional order controller for flexible-joint electrically driven robots under imperfect transformation of control space. The proposed approach is free from manipulator dynamics, thus free from problems associated with torque control strategy in the design and implementation. As a result, the proposed controller is simple, fast response and superior to the torque c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International journal of medical informatics
دوره 83 4 شماره
صفحات -
تاریخ انتشار 2014